Introduction to Testing

Testing is an easy thing to understand but there is also an art to it as well; writing good tests often requires you to try to figure out what input(s) are most likely to break your program.

In addition to this, tests can serve different purposes as well:

  • Testing for correctness
  • Testing for speed (benchmarking)
  • Testing for "Let's check I didn't fuck something up" (a.k.a 'regression testing')
  • ...etc...

All of the above tests have their uses, but as a general rule of thumb a good test suite will include a range of inputs and multiple tests for each.

I would add a small caveat that if there is documentation for a function that says something like "does not work for strings" then although it is possible to write test code for strings what would be the point? The documentation makes it clear that these tests will fail. Instead of writing test code for situations the code was not designed to solve focus on 'realistic' test cases.

Alright, lets write a super simple function that divides A by B:


In [2]:
def divide(a, b):
    """"a, b are ints or floats. Returns a/b"""
    return a / b

Okay so, this is where we need to put our ‘thinking hat’ on for a moment. The documentation for this function specifically states A and B are supposed to be numbers, so instead of wasting time breaking the code with obviously bad inputs (e.g strings) lets try to break with valid inputs. In other words:

what are the possible integers/floats we can pass in where this function may break?

When dealing with numbers there are, I think, three basic tests that are almost always worth running:

  1. Negative Numbers
  2. Zero
  3. Positive Numbers

And in addition to those tests we should also run tests for:

  1. Small inputs (e.g 10/5)
  2. Very large inputs (e.g 999342493249234234234234 / 234234244353452424 )

You may remember for example in lecture 21 as we tried to optimise our is_prime function we introduced some defects when working with small numbers.

Anyway, the point is these five basic cases will cover a lot of situations you may have with numbers. Obviously you should run several tests for each of these basic test cases. And in addition to the basic tests you should run more function specific tests too; for example, if I have a function that returns the factors of n then it would be wise to run a bunch of tests with prime numbers to check what happens there. You should also test highly composite numbers too (e.g 720, 1260). In regard to our division function a good additional test would be when the numerator is larger than the denominator and vice versa (e.g. try both 10/2 and 2/10). Zero is also a special case for division, but we have already listed it in the basic tests.

Okay, so lets write our first tests:


In [3]:
# Function here...
print (divide(10, 2) == 5.0)
[divide(10.0, 2.0) == 5.0, divide(10,2) == 5.0, divide(0, 1) == 0.0 ]


True
Out[3]:
[True, True, True]

Now, we know that X/0 is a ZeroDivisionError, the question is what do we want the result to be? Do we want Python to raise the error? or would we prefer Python to do something else such as return a number or perhaps a string.

Remember that errors are not bad, if Python to throws an error when it gets zero as input that’s totally fine, and in this case I think I’m happy with the error. This means I have to write a test case that expects an error to be raised. We can do that like so…


In [26]:
try:
    divide(1, 0)
    print(False) # note that if the above line of code yields a zeroDiv error, this line of code is not executed. 
except ZeroDivisionError:
    print(True) # Test pass, dividing by zero yields an error.


True

Okay, next up we need to test for large numbers. When it came to small numbers we can easily work out the correct answer by hand, but for large sums that’s not so easy.

Your first instinct here might be to say "use a calculator" and while that’s true, that solution only works in this very specific case. What we actually want is a more general solution that can solve all sorts of problems.

It turns out that sometimes building code that can generate test cases is a lot easier that building the solver. In this particular example we can do just that...

Let's take a step back and ask ourselves what division actually is. The answer is basically the opposite of multiplication. And so, we can actually write test cases for our function by "reverse engineering" the problem. We know from math that the following is always true:

(y * y) / y = y
(x * y) / y = x

And so, so long as we have a function that multiplies correctly, we can be confident that our function is getting the right answer to complex division problems even though we do not know what the right answer is ourselves. In code:


In [29]:
x = 30202020202020202022424354265674567456
y = 95334534534543543543545435543543545345

divide(y * y, y) == float(y)
divide(x * y, y) == float(x)


Out[29]:
True

Most of the time however, the code you want to test will not be so easily reversed engineered. So most of the time your tests are going to be hand-written. And because writing tests can be a bit tedious and time consuming you are going to want tests that are:

  • Fast to write and execute
  • Easy to understand, they should give clear feedback on what went wrong
  • Test most/all of the likely scenarios.

For these reasons, its often a good idea to write tests that follow a common format. Great tests are often tests that you can copy and paste, and change into a new test by changing a small handful of values.

To illustrate that, lets suppose I have the following code:


In [27]:
def firstMissingPositive(nums):
    """
    Given an unsorted integer array (nums) finds the smallest missing positive integer.
    
    for example: 
    [0,1,2,3]  => returns 4, since 4 is the smallest missing number
    [10,11,12] => returns 1
    """
    return 3

This is actually a hard problem to solve efficiently but I don't care about that. Right now, I only care about testing. And this is function that is easy to test.

Test-Driven Development

Sometimes, software developers write tests before they actually write the solution to thier problem. This is called "test-driven development". The advantage of writing tests first is that it forces you to think about the problem in a different way. Instead of thinking about how to solve the problem we instead start out by thinking about the sorts of inputs that are difficult. Sometimes, that means we spot problems faster than we would have otherwise.

Okay, lets write some tests!


In [26]:
print(firstMissingPositive([1,2,3]) == 4)
print(firstMissingPositive([0,0,1]) == 2)
print(firstMissingPositive([1,2]) == 3)


True
True
True

So we have some tests, if "False" gets printed that means the test failed. This is a good start. Notice how these tests are easy to write and understand. We can also quickly add tests by copy & paste plus some tweaks. On the downside, the output is not very informative; Why did a test fail? Here our only option is figure out what test failed and then re-run it, this time makeing a not of the value.


In [5]:
print("Got:", firstMissingPositive([1,2,3]))
print("should be 4")


Got: 3
should be 4

Okay, so if we want better test output, we should probably write some sort of 'test framework'. For example:


In [28]:
## Test
def print_test_result(func, input_val, expected_val):
    
    result = func(input_val)
    if result == expected_val:
        print(f"TEST PASSED")
    else:
        print(f"TEST FAILED (Got: {result} Expected: {expected_val}, Input was: {input_val})")

    
    
##### TESTS GO HERE #####
print_test_result(firstMissingPositive, [1,2,3], 4)
print_test_result(firstMissingPositive, [0,0,1], 2)
print_test_result(firstMissingPositive, [1,2], 3)
print_test_result(firstMissingPositive, [7,6,5,4,3,2], 1)
print_test_result(firstMissingPositive, [1,2,4], 3)


TEST FAILED (Got: 3 Expected: 4, Input was: [1, 2, 3])
TEST FAILED (Got: 3 Expected: 2, Input was: [0, 0, 1])
TEST PASSED
TEST FAILED (Got: 3 Expected: 1, Input was: [7, 6, 5, 4, 3, 2])
TEST PASSED

Okay so know that we have spent a little bit of time working on a test framework we can (a) quickly write new tests and (b) we can also clearly see why a test failed.

In test-driven development once you have a small selection of tests you then try to write code that passes the tests. Let's do that now...


In [30]:
def firstMissingPositive(nums):
    """
    Given an unsorted integer array (nums) finds the smallest missing positive integer.
    
    for example: 
    [0,1,2,3]  => returns 4, since 4 is the smallest missing number
    [10,11,12] => returns 1
    """
    i = 1
    while True:
        i += 1
        if i not in nums:
            return i

Okay, now that we have a solution, lets run the tests:


In [31]:
print_test_result(firstMissingPositive, [1,2,3], 4)
print_test_result(firstMissingPositive, [0,0,1], 2)
print_test_result(firstMissingPositive, [1,2], 3)
print_test_result(firstMissingPositive, [7,6,5,4,3,2], 1)
print_test_result(firstMissingPositive, [1,2,4], 3)


TEST PASSED
TEST PASSED
TEST PASSED
TEST FAILED (Got: 8 Expected: 1, Input was: [7, 6, 5, 4, 3, 2])
TEST PASSED

One test has failed, can you spot what the problem is?

When you have a failing test, and you are not sure exactly what the problem is a good thing to do is to try to make the test as simple as possible. Let's try that now.


In [32]:
print_test_result(firstMissingPositive, [5,4,3,2], 1)


TEST FAILED (Got: 6 Expected: 1, Input was: [5, 4, 3, 2])

Okay we have simplified the test and it still fails. Lets make it even simpler!


In [38]:
print_test_result(firstMissingPositive, [2], 1)
print_test_result(firstMissingPositive, [], 1)


TEST FAILED (Got: 3 Expected: 1, Input was: [2])
TEST FAILED (Got: 2 Expected: 1, Input was: [])

And now its not really possible to make the test any simpler. But by now the problem should be easy to understand; it seems as if the first number we are looking for is 2, so if 1 is missing we fail.

Lets test that hypothesis by writing more tests!


In [39]:
print_test_result(firstMissingPositive, [1,2,3], 4)
print_test_result(firstMissingPositive, [1, 3], 2)
print_test_result(firstMissingPositive, [1], 2)


TEST PASSED
TEST PASSED
TEST PASSED

If you like, you can try to fix the function for homework.

Anyway, we built our own test framework for this example. It turns out that we do not need to do that, Python has a lot of frameworks that have been built by other developers far smarter than myself. And so, instead of re-inventing the wheel we should probably just learn on of these frameworks. There are several options, but in this lecture I shall cover 'doc testing'

Doctesting

You may remember docstrings, the text we put at the very start of a function. Well, write doctests all we have to do is add tests to our docstrings. Its honestly as simple as that. Here is the syntax for a doctest:

""" 
>>> {function name} ( {function argument, if any} )
{expected result}
"""

And then once you have done that, you'll need to copy & paste the code below to run the test:


In [41]:
def run_doctests():
    import doctest
    doctest.testmod()

By default if all your tests pass nothing will be printed, but should a doctest fail Python will give you all the juicy detail. Lets try it now:


In [42]:
def add(a, b):
    """    
    >>> add(10, 10)
    20
    """
    return a + b

run_doctests()

We ran doctests, but since the test past nothing happened. Alright, lets show you want happens on failure:


In [44]:
def run_all_the_tests():   
    """    
    >>> 1 + 1
    2
    
    >>> print(True)
    True
    
    >>> 20 + 2
    23  
    """   

    print("testing complete")
    
run_doctests()


**********************************************************************
File "__main__", line 9, in __main__.run_all_the_tests
Failed example:
    20 + 2
Expected:
    23  
Got:
    22
**********************************************************************
1 items had failures:
   1 of   3 in __main__.run_all_the_tests
***Test Failed*** 1 failures.

As you can see, Python ran four tests and two of them failed. It turns out 20 + 2 does not equal 23 and bad_list (surprise surprise) it up to no good.

Overall, I'd recommend beginners use doctesting. Its fairly easy to use and it allows you to quickly type out basic tests for your functions.

As our final exerise for today lets convert our 'print_test_result' tests into doctests...


In [49]:
def firstMissingPositive_TESTS():
    """
    >>> firstMissingPositive([1,2,3])
    4
    >>> firstMissingPositive([0,0,1])
    2
    >>> firstMissingPositive([1,2])
    3
    >>> firstMissingPositive([1,2,4])
    3
    >>> firstMissingPositive([2])
    1
    """
    pass

# Now we run the tests...
import doctest
doctest.run_docstring_examples(firstMissingPositive_TESTS, globals(), verbose=True)


Finding tests in NoName
Trying:
    firstMissingPositive([1,2,3])
Expecting:
    4
ok
Trying:
    firstMissingPositive([0,0,1])
Expecting:
    2
ok
Trying:
    firstMissingPositive([1,2])
Expecting:
    3
ok
Trying:
    firstMissingPositive([1,2,4])
Expecting:
    3
ok
Trying:
    firstMissingPositive([2])
Expecting:
    1
**********************************************************************
File "__main__", line 11, in NoName
Failed example:
    firstMissingPositive([2])
Expected:
    1
Got:
    3

In [ ]: